Jaipur is the capital and the largest city of the Indian state of Rajasthan. As of 2011, the city had a population of 3.1 million, making it the tenth most populous city in the country. Jaipur is also known as the Pink City, due to the dominant colour scheme of its buildings.] It is located 268 km (167 miles) from the national capital New Delhi. Jaipur is a popular tourist destination in India and forms a part of the west Golden Triangle tourist circuit along with Delhi and Agra (240 km, 149 mi). It also serves as a gateway to other tourist destinations in Rajasthan such as Jodhpur (348 km, 216 mi), Jaisalmer (571 km, 355 mi), Udaipur (421 km, 262 mi), Kota (252 km, 156 mi) and Mount Abu (520 km, 323 mi). Jaipur is located 616 km from Shimla.
The aim of the project is to identify venues in Jaipur, India based on their rating and average prices. In this notebook, we will identify various venues in the city of Jaipur, India, using Foursquare API and Zomato API, to help visitors select the restaurants that suit them the best.
Whenever a user is visiting a city they start looking for places to visit during their stay. They primarily look for places based on the venue ratings across all venues and the average prices such that the locations fits in their budget.
Here, we'll identify places that are fit for various individuals based on the information collected from the two APIs and Data Science. Once we have the plot with the venues, any company can launch an application using the same data and suggest users such information.
The target audience for such a project is twofold. Firstly, any person who is visiting Jaipur, The Pink City of Rajasthan.
India can use the plots and maps from this project to quickly select places that suit their budget and rating preferences. Secondly, an entrepreneur who want to open new restaurants in Jaipur can use this information to create a website or a mobile application, which is updated on a regular basis, to allow individuals to the city or even expand same functionality to other places. We will use the data science tools and techniques to understand or weigh in the pros and cons of a location. We provide an analysis for the stakeholders to take a data driven decision to choose the best category/location/price range in the city about most promising and viable option.
Jaipur is the capital and the largest city of the Indian state of Rajasthan. As of 2011, the city had a population of 3.1 million, making it the tenth most populous city in the country. Jaipur is also known as the Pink City, due to the dominant colour scheme of its buildings.] It is located 268 km (167 miles) from the national capital New Delhi.
import pandas as pd
import numpy as np
import requests
import matplotlib.pyplot as plt
import json
from pandas.io.json import json_normalize
!pip install geopy
from geopy.geocoders import Nominatim
import matplotlib.cm as cm
import matplotlib.colors as colors
from sklearn.cluster import KMeans
!pip install folium
import folium
from geopy.geocoders import Nominatim
print('libraries imported')
address='Jaipur'
geolocator=Nominatim(user_agent='japur_explorer')
location=geolocator.geocode(address,timeout=None)
print(location)
latitude=location.latitude
longitude=location.longitude
print('{} is latitude. {} is longitude'.format(latitude,longitude))
CLIENT_ID = 'RQ5VU21QNUA5ANUCLIGEMSXGHJ43XSGUP3PCXHB4H5KIB1NF' # your Foursquare ID
CLIENT_SECRET = 'SAAMYVHGNGQCTB3QWYDPSLMVJJD1FSIGYYJSMTLUR2EEQK1W' # your Foursquare Secret
VERSION = '20100602'
print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)
radius=20000
LIMIT=150
fsq_url='https://api.foursquare.com/v2/venues/explore?&client_id=RQ5VU21QNUA5ANUCLIGEMSXGHJ43XSGUP3PCXHB4H5KIB1NF&client_secret=SAAMYVHGNGQCTB3QWYDPSLMVJJD1FSIGYYJSMTLUR2EEQK1W&v=20200602&ll=26.916194,75.820349'.format(
CLIENT_ID,
CLIENT_SECRET,
VERSION,
latitude,
longitude,
radius,
LIMIT)
fsq_request=requests.get(fsq_url).json()
fsq_venues=json_normalize(fsq_request['response']['groups'][0]['items'])
fsq_venues.head()
def get_cat_type(row):
categories_list = row['venue.categories']
if len(categories_list) == 0:
return None
else:
return categories_list[0]['name']
column_names=['venue.id','venue.location.lat', 'venue.location.lng','venue.name','venue.categories']
fsq_ven_df=fsq_venues[column_names]
fsq_ven_df['venue.categories']=fsq_ven_df.apply(get_cat_type,axis=1)
fsq_ven_df.head()
jaipur_map=folium.Map(location=[latitude,longitude],zoom_start=14)
jaipur_map
for lat,long,name in zip(fsq_ven_df['venue.location.lat'],fsq_ven_df['venue.location.lng'],fsq_ven_df['venue.name']):
label = '{}'.format(name)
label = folium.Popup(label, parse_html=True)
folium.CircleMarker(
[lat, long],
radius=5,
popup=label,
color='Yellow',
fill=True,
fill_color='#34867b',
fill_opacity=0.7,
parse_html=True).add_to(jaipur_map)
jaipur_map
import requests
headers = {
'Accept': 'application/json',
'user-key': '6caad1604d0957334c32550f321d2e5b',
}
venues_information = []
for i,j,k in zip(fsq_ven_df['venue.name'],fsq_ven_df['venue.location.lat'],fsq_ven_df['venue.location.lng']):
url=('https://developers.zomato.com/api/v2.1/search?q={}' +
'&start=0&lat={}&lon={}&radius=3000').format(i,j,k)
response = requests.get(url, headers=headers).json()
if len(response['restaurants'])>0:
for i in range(len(response['restaurants'])):
zom_venue=[]
zom_venue.append(response['restaurants'][i]['restaurant']['name'])
zom_venue.append(response['restaurants'][i]['restaurant']['location']['latitude'])
zom_venue.append(response['restaurants'][i]['restaurant']['location']['longitude'])
zom_venue.append(response['restaurants'][i]['restaurant']['average_cost_for_two'])
zom_venue.append(response['restaurants'][i]['restaurant']['price_range'])
zom_venue.append(response['restaurants'][i]['restaurant']['user_rating']['aggregate_rating'])
zom_venue.append(response['restaurants'][i]['restaurant']['location']['address'])
zom_venue.append(response['restaurants'][i]['restaurant']['cuisines'])
venues_information.append(zom_venue)
else:
venues_information.append(np.zeros(6))
zomato_venues = pd.DataFrame(venues_information,
columns = ['venue', 'latitude',
'longitude', 'price_for_two',
'price_range', 'rating', 'address','cuisines'])
zomato_venues.head()
zomato_venues['latitude']=zomato_venues['latitude'].astype(float)
zomato_venues['longitude']=zomato_venues['longitude'].astype(float)
jaipur_map=folium.Map(location=[latitude,longitude],zoom_start=14)
jaipur_map
for lat,long,name in zip(zomato_venues['latitude'],zomato_venues['longitude'],zomato_venues['venue']):
label = '{}'.format(name)
label = folium.Popup(label, parse_html=True)
folium.CircleMarker(
[lat, long],
radius=5,
popup=label,
color='Yellow',
fill=True,
fill_color='#36167b',
fill_opacity=0.7,
parse_html=True).add_to(jaipur_map)
jaipur_map
Extracted Restaurant venues resulted in 496 venues. We have discovered from the data collected that there are many duplicate venues and venues from other cities. We have dropped the duplicates and filtered out the restaurants by the city name ‘Jaipur’ from the address. Final dataset has 284 restaurants from the city Jaipur without any redundant data.
print('Total Venues:',zomato_venues.shape[0])
jaipur_zom=zomato_venues[zomato_venues['address'].str.contains('Jaipur')==True]
print('Total Venues available in Jaipur:',jaipur_zom.shape[0])
jaipur_zom.drop_duplicates(subset='venue',keep='first',inplace=True)
print('Total Venues in Jaipur after dropping duplicates:',jaipur_zom.shape[0])
jaipur_zom['rating']=jaipur_zom['rating'].astype(float)
jaipur_zom['venue']=jaipur_zom['venue'].astype(str)
jaipur_zom['latitude']=jaipur_zom['latitude'].astype(float)
jaipur_zom['longitude']=jaipur_zom['longitude'].astype(float)
jaipur_zom.head()
Lets define the central location in the city and calculate distances for each of the venues
#Central location of Jaipur
Lat_central=26.9124
Long_central=75.7873
folium.Map(location=[Lat_central,Long_central])
from geopy.distance import geodesic
def get_distance(lat,long):
grt = (Lat_central,Long_central)
curr = (lat,long)
return geodesic(grt,curr).km
distance_list=[]
for i,j in zip(jaipur_zom['latitude'], jaipur_zom['longitude']):
distance_list.append(get_distance(i,j))
jaipur_zom['distance']=distance_list
jaipur_zom.head()
jaipur_zom['distance']=jaipur_zom['distance'].round(2)
In this project we will try to propose various profiles of restaurants around a defined central location, based on the ratings, price range and distance from the central location of the City. .We have selected a central location i.e NTR circle as the central location.
First we get all the locations from foursquare api, than get all the restaurants with user ratings, passing the venues of four square api as an input after that calculate distance from the defined central location to each of the restaurants, clustering based on ratings, distance,price range within 3 kms range from the central location for profiling the restaurants.
We will explore the data based on the rating.
distribution_rating=jaipur_zom['rating'].value_counts().sort_index()
color=cm.rainbow((np.linspace(0,1,12)))
plt.figure(figsize = (30, 12))
plt.bar(distribution_rating.index,distribution_rating.values,color=color)
plt.xlabel("Rating", fontsize = 25)
plt.ylabel("Count", fontsize = 25)
plt.title("Count of venues with given rating", fontsize = 30)
From the above plots we can understand that the ratings of the restaurants are highly ranging from 2.5 to 5. About 15 restaurants has the rating '0'.Lets explore the subset of the dataset which has poor rating of 0.
sub_jaipur=jaipur_zom[(jaipur_zom['rating']==0) & (jaipur_zom['distance']<=2000)]
sub_jaipur
cui=sub_jaipur['cuisines'].value_counts().sort_index()
color=cm.rainbow((np.linspace(0,1,12)))
plt.figure(figsize = (30, 12))
plt.bar(cui.index,cui.values,color=color)
plt.xticks(rotation='vertical',fontsize=26)
plt.xlabel("cuisines", fontsize = 25)
plt.ylabel("Count", fontsize = 25)
plt.title("Count of venues with given cuisines", fontsize = 30)
plt.bar
price_range=jaipur_zom['price_range'].value_counts().sort_index()
plt.figure(figsize = (30, 12))
plt.bar(price_range.index,price_range.values)
plt.xlabel("Price Range", fontsize = 25)
plt.ylabel("Count", fontsize = 25)
plt.title("Count of venues with given Price Range", fontsize = 30)
From the above plot we can undestand that restaurants providing more or less same kind of cuisines have 0 rating.We can't assess which cuisine/category in particular has poor rating from the above plot.
Price range for the distribution is pocket friendly in the city of Jaipur. Zomato Price_range explains cost range from 1 to 4(1 being the pocket friendly to 4 being expensive)
average_prices = jaipur_zom['price_for_two'].value_counts().sort_index()
plt.figure(figsize = (20, 12))
plt.scatter(average_prices.index,
average_prices.values,
s = average_prices.index*10,
c = cm.rainbow(np.linspace(0, 1, len(average_prices.index))))
plt.xlabel("Price per person", fontsize = 13)
plt.ylabel("Venue count", fontsize = 13)
plt.title("Count of venues with given average price", fontsize = 13)
final_ven=jaipur_zom[(jaipur_zom['distance']<=2000) &(jaipur_zom['rating']!=0) & (jaipur_zom['price_for_two']!=0)]
final_ven.shape
final_ven=final_ven.drop(['venue','address','cuisines'],1)
final_ven.head()
wcss = []
for i in range(1, 11):
kmeans = KMeans(n_clusters=i, max_iter=300, n_init=10, random_state=0)
kmeans.fit(final_ven)
wcss.append(kmeans.inertia_)
plt.plot(range(1, 11), wcss)
plt.title('Elbow Method')
plt.xlabel('Number of clusters')
plt.ylabel('WCSS')
plt.show()
#cluster the areas based on the various features of restaurants
from sklearn.cluster import KMeans
NO_OF_CLUSTERS = 4
clustering = final_ven
kMeans = KMeans(n_clusters = NO_OF_CLUSTERS, random_state = 0).fit(clustering)
kMeans.labels_
clustered_df=jaipur_zom[(jaipur_zom['distance']<=2000) &(jaipur_zom['rating']!=0)& (jaipur_zom['price_for_two']!=0)]
clustered_df.insert(0, 'cluster_labels', kMeans.labels_)
clustered_df.head()
cluster_0=clustered_df[clustered_df['cluster_labels']==0]
print('Mean distance of a distance from central location of cluster 0: ',clustered_df[clustered_df['cluster_labels']==0]['distance'].mean())
print('Mean rating of cluster 0: ',clustered_df[clustered_df['cluster_labels']==0]['rating'].mean())
print('Mean price rate for an individual of cluster 0: ',(clustered_df[clustered_df['cluster_labels']==0]['price_for_two']/2).mean())
print('There are {} restaurants in cluster 0 '.format(clustered_df[clustered_df['cluster_labels']==0]['rating'].shape[0]))
print('{} % of restaurants are in cluster 0 '.format(round((cluster_0.shape[0]/clustered_df.shape[0])*100,2)))
cui=cluster_0['cuisines'].value_counts().sort_index()
color=cm.rainbow((np.linspace(0,1,12)))
plt.figure(figsize = (30, 12))
plt.bar(cui.index,cui.values,color=color)
plt.xticks(rotation='vertical',fontsize=20)
plt.xlabel("cuisines", fontsize = 25)
plt.ylabel("Count", fontsize = 25)
plt.title("Count of venues with given cuisines", fontsize = 30)
cluster_1=clustered_df[clustered_df['cluster_labels']==1]
print('Mean distance of a distance from central location of cluster 1: ',clustered_df[clustered_df['cluster_labels']==1]['distance'].mean())
print('Mean rating of cluster 1: ',clustered_df[clustered_df['cluster_labels']==1]['rating'].mean())
print('Mean price rate for an individual of cluster 1: ',(clustered_df[clustered_df['cluster_labels']==1]['price_for_two']/2).mean())
print('There are {} restaurants in cluster 1 '.format(clustered_df[clustered_df['cluster_labels']==1]['rating'].shape[0]))
print('{} % of restaurants are in cluster 1 '.format(round((cluster_1.shape[0]/clustered_df.shape[0])*100,2)))
cui=cluster_1['cuisines'].value_counts().sort_index()
color=cm.rainbow((np.linspace(0,1,12)))
plt.figure(figsize = (30, 12))
plt.bar(cui.index,cui.values,color=color)
plt.xticks(rotation='vertical',fontsize=20)
plt.xlabel("cuisines", fontsize = 25)
plt.ylabel("Count", fontsize = 25)
plt.title("Count of venues with given cuisines", fontsize = 30)
cluster_1
cluster_2=clustered_df[clustered_df['cluster_labels']==2]
print('Mean distance of a distance from central location of cluster 2: ',clustered_df[clustered_df['cluster_labels']==2]['distance'].mean())
print('Mean rating of cluster 2: ',clustered_df[clustered_df['cluster_labels']==2]['rating'].mean())
print('Mean price rate for an individual of cluster 2: ',(clustered_df[clustered_df['cluster_labels']==2]['price_for_two']/2).mean())
print('There are {} restaurants in cluster 2 '.format(clustered_df[clustered_df['cluster_labels']==2]['rating'].shape[0]))
print('{} % of restaurants are in cluster 2 '.format(round((cluster_2.shape[0]/clustered_df.shape[0])*100,2)))
cui=cluster_2['cuisines'].value_counts().sort_index()
color=cm.rainbow((np.linspace(0,1,12)))
plt.figure(figsize = (30, 12))
plt.bar(cui.index,cui.values,color=color)
plt.xticks(rotation='vertical',fontsize=20)
plt.xlabel("cuisines", fontsize = 25)
plt.ylabel("Count", fontsize = 25)
plt.title("Count of venues with given cuisines", fontsize = 30)
cluster_2
cluster_3=clustered_df[clustered_df['cluster_labels']==3]
print('Mean distance of a distance from central location of cluster 3 ',clustered_df[clustered_df['cluster_labels']==3]['distance'].mean())
print('Mean rating of cluster 3: ',clustered_df[clustered_df['cluster_labels']==3]['rating'].mean())
print('Mean price rate for an individual of cluster 3: ',(clustered_df[clustered_df['cluster_labels']==3]['price_for_two']/2).mean())
print('There are {} restaurants in cluster '.format(clustered_df[clustered_df['cluster_labels']==3]['rating'].shape[0]))
print('{} % of restaurants are in cluster 3 '.format(round((cluster_3.shape[0]/clustered_df.shape[0])*100,2)))
cui=cluster_3['cuisines'].value_counts().sort_index()
color=cm.rainbow((np.linspace(0,1,12)))
plt.figure(figsize = (30, 12))
plt.bar(cui.index,cui.values,color=color)
plt.xticks(rotation='vertical',fontsize=20)
plt.xlabel("cuisines", fontsize = 25)
plt.ylabel("Count", fontsize = 25)
plt.title("Count of venues with given cuisines", fontsize = 30)
cluster_3
cluster_0=clustered_df[clustered_df['cluster_labels']==0]
cluster_0.head()
for lat,long,name,distance,rating in zip(cluster_0['latitude'],cluster_0['longitude'],cluster_0['venue'],cluster_0['distance'],cluster_0['rating']):
label = '{} , \n distance from the center {} km ,rating: {}'.format(name,distance,rating)
label = folium.Popup(label, parse_html=True)
folium.CircleMarker(
[lat, long],
radius=5,
popup=label,
color='green',
fill=True,
fill_color='#3144aa',
fill_opacity=0.7,
parse_html=True).add_to(jaipur_map)
jaipur_map
cluster_1=clustered_df[clustered_df['cluster_labels']==1]
cluster_1.head()
for lat,long,name,distance,rating in zip(cluster_1['latitude'],cluster_1['longitude'],cluster_1['venue'],cluster_1['distance'],cluster_0['rating']):
label = '{} , \n distance from the center {} km ,rating: {}'.format(name,distance,rating)
label = folium.Popup(label, parse_html=True)
folium.CircleMarker(
[lat, long],
radius=5,
popup=label,
color='green',
fill=True,
fill_color='#3144aa',
fill_opacity=0.7,
parse_html=True).add_to(jaipur_map)
jaipur_map
cluster_2=clustered_df[clustered_df['cluster_labels']==2]
cluster_2.head()
for lat,long,name,distance,rating in zip(cluster_2['latitude'],cluster_2['longitude'],cluster_2['venue'],cluster_2['distance'],cluster_0['rating']):
label = '{} , \n distance from the center {} km ,rating: {}'.format(name,distance,rating)
label = folium.Popup(label, parse_html=True)
folium.CircleMarker(
[lat, long],
radius=5,
popup=label,
color='black',
fill=True,
fill_color='#8186aa',
fill_opacity=0.7,
parse_html=True).add_to(jaipur_map)
jaipur_map
cluster_3=clustered_df[clustered_df['cluster_labels']==3]
cluster_3.head()
for lat,long,name,distance,rating in zip(cluster_3['latitude'],cluster_3['longitude'],cluster_3['venue'],cluster_3['distance'],cluster_3['rating']):
label = '{} , \n distance from the center {} km ,rating: {}'.format(name,distance,rating)
label = folium.Popup(label, parse_html=True)
folium.CircleMarker(
[lat, long],
radius=5,
popup=label,
color='orange',
fill=True,
fill_color='#3468aa',
fill_opacity=0.7,
parse_html=True).add_to(jaipur_map)
jaipur_map
Based on the analysis we can draw many conclusions.We have profiled the clusters based on the price_Range,rating,average price for two and distance from the center.
As per analysis , we can suggest that cluster 0 and cluster 1 is budgeted restaurants for medium class with rating ranging to 4, which can be considered for dinning for tourists and for any person want to visit restaurants in Jaipur. Also for any business person looking to open restaurants in this location. It can be suggested as good option.
Cluster 3 is little expensive compare to cluster 0 and 1 with same rating coming as 4.
Cluster 2 is very expensive with rating 3.8.
On the basis of data analysis for restaurants in Jaipur ,anyone can use this information to build up an on line website/mobile application, to provide users with up to date information about various venues in the city based on the search criteria (name, rating and price).
The purpose of this project was to explore the places that a person visiting Jaipur could visit. The venues have been identified using Foursquare and Zomato API and have been plotted on the map